Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(test suite): use cgroups to detect if a test leaks processes #6470

Open
wants to merge 18 commits into
base: main
Choose a base branch
from

Conversation

problame
Copy link
Contributor

@problame problame commented Jan 25, 2024

Epic: #6485

Problem

Some tests leave stay processes behind after they exit.

This is the potential root cause for failed coverage-report generation, as well as other flakiness.

Solution

Before executing a test, create & enter a cgroup.
After the test is done, ensure that there are no processes left in the cgroup.

This banks on the assumption that the tests themselves don't do anything with the cgroup hierarchy, which is currently the case.

Changes

  • Use NeonEnvBuilder's __enter__ and __exit__ context manager hooks to implement the solution outlined above.
  • Fix the problems discovered using this mechanism:
    • fix(neon_local): leaks compute_ctl child process if get_status() fails`
      
      Copy-pasting from #6474 here; as multiple TODO comments in this file
      indicate, we should really be using background_process::start_process
      
    • reorder vanilla_pg and neon_env_builder in one test where we saw failures
      => follow-up stray process check: move out of NeonEnvBuilder into an autouse fixture #6487
    • fix test_neon_two_primary_endpoints_fail; it doesn't add to NeonEnv.endpoints,
      hence it didn't get stopped as part of NeonEnvBuilder.__exit__'s call to NeonEnv.stop.

Follow-Ups

@problame problame changed the base branch from problame/iss-6366/refactor to main January 25, 2024 12:44
@problame problame force-pushed the problame/neon-env-builder-cgroup branch from 9a4d27b to 4204a7d Compare January 25, 2024 13:05
@problame problame requested review from bayandin and koivunej January 25, 2024 13:05
Copy link
Member

@koivunej koivunej left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is looking great!

@koivunej
Copy link
Member

If we need to re-run until we see a coverage failure, this might be handy: fcf17a3

@koivunej
Copy link
Member

Permission problems:

==================================== ERRORS ====================================
________ ERROR at teardown of test_pageserver_multiple_keys[debug-pg14] ________
[gw9] linux -- Python 3.9.2 /github/home/.cache/pypoetry/virtualenvs/neon-_pxWMzVK-py3.9/bin/python
/github/home/.cache/pypoetry/virtualenvs/neon-_pxWMzVK-py3.9/lib/python3.9/site-packages/allure_commons/_allure.py:221: in __call__
    return self._fixture_function(*args, **kwargs)
test_runner/fixtures/neon_fixtures.py:1383: in neon_env_builder
    yield builder
test_runner/fixtures/neon_fixtures.py:976: in __exit__
    with open(self.test_cgroup_dir.parent / "cgroup.procs", "a") as f:
E   PermissionError: [Errno 13] Permission denied: '/sys/fs/cgroup/neon_testsuite/cgroup.procs'

Copy link

github-actions bot commented Jan 25, 2024

No tests were run or test report is not available

Test coverage report is not available

The comment gets automatically updated with the latest test results
2978c83 at 2024-01-26T14:11:42.442Z :recycle:

@koivunej
Copy link
Member

koivunej commented Jan 26, 2024

Looking great in https://neon-github-public-dev.s3.amazonaws.com/reports/pr-6470/7660119788/index.html#suites/90de3f9cafdc78be9db0b2ada81f7c26/ea7eab72b6180321:

2024-01-25 20:56:40.802 WARNING [neon_fixtures.py:994] SIGKILLing leaked process: 18949: ['/tmp/neon/bin/compute_ctl', '--http-port', '26120', '--pgdata', '/tmp/test_output/test_local_corruption[debug-pg14]/repo/endpoints/ep-3/pgdata', '--connstr', 'postgresql://cloud_admin@127.0.0.1:26119/postgres', '--spec-path', '/tmp/test_output/test_local_corruption[debug-pg14]/repo/endpoints/ep-3/spec.json', '--pgbin', '/tmp/neon/pg_install/v14/bin/postgres', '']

Was this an injected failure? No.

@bayandin bayandin marked this pull request as ready for review January 26, 2024 08:32
@bayandin bayandin marked this pull request as draft January 26, 2024 08:33
@bayandin
Copy link
Member

Looking great in https://neon-github-public-dev.s3.amazonaws.com/reports/pr-6470/7660119788/index.html#suites/90de3f9cafdc78be9db0b2ada81f7c26/ea7eab72b6180321:

2024-01-25 20:56:40.802 WARNING [neon_fixtures.py:994] SIGKILLing leaked process: 18949: ['/tmp/neon/bin/compute_ctl', '--http-port', '26120', '--pgdata', '/tmp/test_output/test_local_corruption[debug-pg14]/repo/endpoints/ep-3/pgdata', '--connstr', 'postgresql://cloud_admin@127.0.0.1:26119/postgres', '--spec-path', '/tmp/test_output/test_local_corruption[debug-pg14]/repo/endpoints/ep-3/spec.json', '--pgbin', '/tmp/neon/pg_install/v14/bin/postgres', '']

Was this an injected failure? No.

It looks the similar thing as we discovered in #6270 (comment) — when endpoint start fails it leaves stray compute_ctl for some time

@koivunej
Copy link
Member

If endpoint startup fails, probably we could just leave out the sync-safekeepers because no LSN should had changed, but perhaps that's a difficult call to make.. Alternatively that particular assert could be formulated as "basebackup fails".

Well, looking at control_plane/src/endpoint.rs this is quite clearly a problem of just not waiting for the spawned postgres to stop, similarly to #6474 but we probably cannot just kill it but let the sync safekeepers go through, as we cannot know that it was basebackup which failed.

Comment on lines 51 to 54
# Add nonroot user
RUN useradd -ms /bin/bash nonroot -b /home
SHELL ["/bin/bash", "-c"]
RUN echo "ALL ALL = (ALL) NOPASSWD: ALL" >> /etc/sudoers
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TOOD: security review

@@ -420,19 +420,22 @@ jobs:
container:
image: 369495373322.dkr.ecr.eu-central-1.amazonaws.com/build-tools:${{ needs.build-buildtools-image.outputs.build-tools-tag }}
# Default shared memory is 64mb
options: --init --shm-size=512mb
options: --init --shm-size=512mb --cgroupns=private --privileged
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO security review

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy-pasting from #6474 here; as multiple TODO comments in this file
indicate, we should really be using background_process::start_process
for compute_ctl => #6482
@problame problame marked this pull request as ready for review January 26, 2024 10:26
@problame problame requested review from a team as code owners January 26, 2024 10:26
@problame problame requested review from save-buffer and removed request for a team and save-buffer January 26, 2024 10:26
@areyou1or0
Copy link

areyou1or0 commented Jan 26, 2024

Dropping the same comment I dropped on Slack:

But fundamentally, I think the --privileged is more risky, and I can't go without that (it's needed so the cgroup2fs mount is rw).

I suggest we implement an alternative - this increases privilege escalation risks majorly. It provides almost the same capabilities for the container as the host machine. There are so many ways to exploit this, implementing this would pose a big security threat. (via container escape, privilege escalation to cgroup manipulation, execute arbitrary commands, lateral movements etc. etc.)

grant passwdless sudo to the nonroot user

Like the above scenario, if an attacker gain access to the container, they can simply abuse the sudo privileges from the non-root user. This bypasses any security best practices.
You can provide sudo privileges to the specific commands only. We should definitely avoid sudo on nonroot user as this will basically make any non-root users run with root rights.
Instead of --privileged, why not use more fine-grained capabilities? Or perhaps use --mount on the necessary filesystems?

This change does NOT pass security review - please implement an alternative. @problame

@areyou1or0 areyou1or0 self-requested a review January 26, 2024 11:00
Copy link

@areyou1or0 areyou1or0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can provide sudo privileges to the specific commands only. We should definitely avoid sudo on nonroot user as this will basically make any non-root users run with root rights.
Instead of --privileged, why not use more fine-grained capabilities? Or perhaps use --mount on the necessary filesystems?

@problame
Copy link
Contributor Author

problame commented Jan 26, 2024

So, here's my hack to remount the cgroupfs in the container as rw and make delegate a cgroup subtree to our nonroot user.


Create a docker container.

cs@devvm-mbp:[~/src/neon-work-1/test]: docker run --rm -it --cgroupns=private test bash
nonroot@a2351caca14e:~$ echo $$
1

So, the bash process is PID 1 in the container's pid namespace.
What's its PID on the host?

cs@devvm-mbp:[~/src/neon-work-1/test]: docker inspect a2351caca14e
[
    {
        "Id": "a2351caca14eb2eb33ff08ff0be428012a252d0383c87262640e3fcaea1153f5",
        "Created": "2024-01-26T16:18:09.401859811Z",
        "Path": "bash",
        "Args": [],
        "State": {
            "Status": "running",
            "Running": true,
            "Paused": false,
            "Restarting": false,
            "OOMKilled": false,
            "Dead": false,
            "Pid": 2992050,
...

So, PID 1 in the container is PID 2992050 in the host PID namespace.

  • Enter that process's namespaces with our root privileges
  • remount the cgroupfs in the container as rw
  • create a cgroup neon_testsuite that's delegated to our nonroot user
  • move the container PID 1 into that cgroup
  • exit
cs@devvm-mbp:[~/src/neon-work-1/test]: sudo nsenter --target 2992050 --all
root@a2351caca14e:/# mount -o remount,rw /sys/fs/cgroup
root@a2351caca14e:/# mkdir /sys/fs/cgroup/neon_testsuite
root@a2351caca14e:/# chown nonroot:nonroot /sys/fs/cgroup/neon_testsuite
root@a2351caca14e:/# chown nonroot:nonroot /sys/fs/cgroup/neon_testsuite/cgroup.procs
root@a2351caca14e:/# echo 1 > /sys/fs/cgroup/neon_testsuite/cgroup.procs 
root@a2351caca14e:/# exit
logout

Now our nonroot user can operate on its delegated cgroup.

nonroot@a2351caca14e:~$ mkdir /sys/fs/cgroup/neon_testsuite/foo
nonroot@a2351caca14e:~$ mkdir /sys/fs/cgroup/neon_testsuite/foo^C
nonroot@a2351caca14e:~$ sleep 100000&
[1] 16
nonroot@a2351caca14e:~$ echo 16 > /sys/fs/cgroup/neon_testsuite/foo/cgroup.procs 
nonroot@a2351caca14e:~$ 

But it cannot operate on the root cgroup because it's owned by root

nonroot@a2351caca14e:~$ mkdir /sys/fs/cgroup/foo
mkdir: cannot create directory '/sys/fs/cgroup/foo': Permission denied
nonroot@a2351caca14e:~$ ls -lah /sys/fs/cgroup/
total 0
drwxr-xr-x 3 root    root    0 Jan 26 16:19 .
drwxr-xr-x 7 root    root    0 Jan 26 16:18 ..
-r--r--r-- 1 root    root    0 Jan 26 16:19 cgroup.controllers
-r--r--r-- 1 root    root    0 Jan 26 16:18 cgroup.events
-rw-r--r-- 1 root    root    0 Jan 26 16:19 cgroup.freeze
-rw-r--r-- 1 root    root    0 Jan 26 16:19 cgroup.max.depth
-rw-r--r-- 1 root    root    0 Jan 26 16:19 cgroup.max.descendants
-rw-r--r-- 1 root    root    0 Jan 26 16:19 cgroup.procs
-r--r--r-- 1 root    root    0 Jan 26 16:19 cgroup.stat
-rw-r--r-- 1 root    root    0 Jan 26 16:18 cgroup.subtree_control
-rw-r--r-- 1 root    root    0 Jan 26 16:19 cgroup.threads
-rw-r--r-- 1 root    root    0 Jan 26 16:19 cgroup.type
-rw-r--r-- 1 root    root    0 Jan 26 16:19 cpu.max
-rw-r--r-- 1 root    root    0 Jan 26 16:19 cpu.pressure
-r--r--r-- 1 root    root    0 Jan 26 16:19 cpu.stat
-rw-r--r-- 1 root    root    0 Jan 26 16:19 cpu.weight
-rw-r--r-- 1 root    root    0 Jan 26 16:19 cpu.weight.nice
-rw-r--r-- 1 root    root    0 Jan 26 16:19 cpuset.cpus
-r--r--r-- 1 root    root    0 Jan 26 16:19 cpuset.cpus.effective
-rw-r--r-- 1 root    root    0 Jan 26 16:19 cpuset.cpus.partition
-rw-r--r-- 1 root    root    0 Jan 26 16:19 cpuset.mems
-r--r--r-- 1 root    root    0 Jan 26 16:19 cpuset.mems.effective
-r--r--r-- 1 root    root    0 Jan 26 16:19 hugetlb.1GB.current
-r--r--r-- 1 root    root    0 Jan 26 16:19 hugetlb.1GB.events
-r--r--r-- 1 root    root    0 Jan 26 16:19 hugetlb.1GB.events.local
-rw-r--r-- 1 root    root    0 Jan 26 16:19 hugetlb.1GB.max
-r--r--r-- 1 root    root    0 Jan 26 16:19 hugetlb.1GB.rsvd.current
-rw-r--r-- 1 root    root    0 Jan 26 16:19 hugetlb.1GB.rsvd.max
-r--r--r-- 1 root    root    0 Jan 26 16:19 hugetlb.2MB.current
-r--r--r-- 1 root    root    0 Jan 26 16:19 hugetlb.2MB.events
-r--r--r-- 1 root    root    0 Jan 26 16:19 hugetlb.2MB.events.local
-rw-r--r-- 1 root    root    0 Jan 26 16:19 hugetlb.2MB.max
-r--r--r-- 1 root    root    0 Jan 26 16:19 hugetlb.2MB.rsvd.current
-rw-r--r-- 1 root    root    0 Jan 26 16:19 hugetlb.2MB.rsvd.max
-r--r--r-- 1 root    root    0 Jan 26 16:19 hugetlb.32MB.current
-r--r--r-- 1 root    root    0 Jan 26 16:19 hugetlb.32MB.events
-r--r--r-- 1 root    root    0 Jan 26 16:19 hugetlb.32MB.events.local
-rw-r--r-- 1 root    root    0 Jan 26 16:19 hugetlb.32MB.max
-r--r--r-- 1 root    root    0 Jan 26 16:19 hugetlb.32MB.rsvd.current
-rw-r--r-- 1 root    root    0 Jan 26 16:19 hugetlb.32MB.rsvd.max
-r--r--r-- 1 root    root    0 Jan 26 16:19 hugetlb.64KB.current
-r--r--r-- 1 root    root    0 Jan 26 16:19 hugetlb.64KB.events
-r--r--r-- 1 root    root    0 Jan 26 16:19 hugetlb.64KB.events.local
-rw-r--r-- 1 root    root    0 Jan 26 16:19 hugetlb.64KB.max
-r--r--r-- 1 root    root    0 Jan 26 16:19 hugetlb.64KB.rsvd.current
-rw-r--r-- 1 root    root    0 Jan 26 16:19 hugetlb.64KB.rsvd.max
-rw-r--r-- 1 root    root    0 Jan 26 16:19 io.max
-rw-r--r-- 1 root    root    0 Jan 26 16:19 io.pressure
-r--r--r-- 1 root    root    0 Jan 26 16:19 io.stat
-rw-r--r-- 1 root    root    0 Jan 26 16:19 io.weight
-r--r--r-- 1 root    root    0 Jan 26 16:19 memory.current
-r--r--r-- 1 root    root    0 Jan 26 16:18 memory.events
-r--r--r-- 1 root    root    0 Jan 26 16:19 memory.events.local
-rw-r--r-- 1 root    root    0 Jan 26 16:19 memory.high
-rw-r--r-- 1 root    root    0 Jan 26 16:19 memory.low
-rw-r--r-- 1 root    root    0 Jan 26 16:19 memory.max
-rw-r--r-- 1 root    root    0 Jan 26 16:19 memory.min
-r--r--r-- 1 root    root    0 Jan 26 16:19 memory.numa_stat
-rw-r--r-- 1 root    root    0 Jan 26 16:19 memory.oom.group
-rw-r--r-- 1 root    root    0 Jan 26 16:19 memory.pressure
-r--r--r-- 1 root    root    0 Jan 26 16:19 memory.stat
-r--r--r-- 1 root    root    0 Jan 26 16:19 memory.swap.current
-r--r--r-- 1 root    root    0 Jan 26 16:19 memory.swap.events
-rw-r--r-- 1 root    root    0 Jan 26 16:19 memory.swap.high
-rw-r--r-- 1 root    root    0 Jan 26 16:19 memory.swap.max
drwxr-xr-x 3 nonroot nonroot 0 Jan 26 16:19 neon_testsuite
-r--r--r-- 1 root    root    0 Jan 26 16:19 pids.current
-r--r--r-- 1 root    root    0 Jan 26 16:19 pids.events
-rw-r--r-- 1 root    root    0 Jan 26 16:19 pids.max
-r--r--r-- 1 root    root    0 Jan 26 16:19 rdma.current
-rw-r--r-- 1 root    root    0 Jan 26 16:19 rdma.max

problame added a commit that referenced this pull request Jan 26, 2024
Epic: #6485

Before this PR, some tests would leak child processes.
We found them using the approach in
#6470.

This PR fixes the findings because PR#6470 is being delayed due to
security concerns.
@problame problame self-assigned this Feb 6, 2024
@problame
Copy link
Contributor Author

problame commented Feb 6, 2024

@bayandin bayandin removed their request for review June 4, 2024 12:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants